Computational Biology Lecture 11: Pairwise alignment using HMMs
نویسنده
چکیده
We looked at various alignment algorithms with different scoring schemes. We argued that the score of an alignment is related to the relative likelihood that the two sequences are related compared to being unreleated, and we used the log-odds ratio to express this relative likelihood while maintaining an additive scoring scheme. Therefore, maximizing the score of an alignment was in some sense equivalent to maximizing the log-odds ratio, with the exception that gaps are scored separately and are not related to the log-odds ratio. Recall that we have considered only ungapped alignments in deriving the scores that relate to the log-odds ratio, and assumed a separate model for scoring gaps; for instance, an affine gap penalty function. Now we will unify both models into a single probabilistic model and see how the score of a gapped alignment of the Needleman-Wunsch algorithm can be viewed as a maximum log-odds ratio obtained by a Viterbi algorithm for an HMM that generates two sequences simultaneously.
منابع مشابه
gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملAligning sequences with repetitive motifs
Pairwise sequence alignment is among the most intensively studied problems in computational biology. We present a method for alignment of two sequences containing repetitive motifs. This is motivated by biological studies of proteins with zinc finger domain, an important group of regulatory proteins. Due to their evolutionary history, sequences of these proteins contain a variable number of dif...
متن کاملHomology Detection via Family Pairwise Search
The function of an unknown biological sequence can often be accurately inferred by identifying sequences homologous to the original sequence. Given a query set of known homologs, there exist at least three general classes of techniques for finding additional homologs: pairwise sequence comparisons, motif analysis, and hidden Markov modeling. Pairwise sequence comparisons are typically employed ...
متن کاملCombining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships
One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more proteins whose structure or function is already known. Toward this end, we propose a means of represen...
متن کاملPattern Matching Techniques and Their Applications to Computational Molecular Biology - A Review
Pattern matching techniques have been useful in solving many problems associated with computer science, including data compression (Chrochemore and Lecroq, 1996), data encryption (RSA Laboratories, 1993), and computer vision (Grimson and Huttenlocher, 1990). In recent years, developments in molecular biology have led to large scale sequencing of genomic DNA. Since this data is being produced in...
متن کامل